{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-colab"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/pathwaycom/pathway-examples/blob/main/documentation/installation_first_steps.ipynb\" target=\"_parent\"><img src=\"https://pathway.com/assets/colab-badge.svg\" alt=\"Run In Colab\" class=\"inline\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# [Colab-specific instructions] Installing Pathway with Python 3.8+\n",
        "\n",
        "> In the cell below we install pathway into a Python 3.8+ Linux runtime.\n",
        "> Please:\n",
        "> 1. **Insert in the form below the pip install link** given to you with your beta access.\n",
        "> 2. **Run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning. **The installation and loading time is less than 1 minute**.\n"
      ],
      "metadata": {
        "id": "notebook-instructions"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#@title \u2699\ufe0f Pathway installer. Please provide the pip install link for Pathway:\n",
        "# Please copy here the installation line:\n",
        "PATHWAY_INSTALL_LINE='pip install https://packages.pathway.com/...' #@param {type:\"string\"}\n",
        "\n",
        "if PATHWAY_INSTALL_LINE.startswith('pip install '):\n",
        "    PATHWAY_INSTALL_LINE=PATHWAY_INSTALL_LINE[len('pip install '):]\n",
        "\n",
        "class InterruptExecution(Exception):\n",
        "    def _render_traceback_(self):\n",
        "        pass\n",
        "\n",
        "if '...' in PATHWAY_INSTALL_LINE or not PATHWAY_INSTALL_LINE.startswith('https://'):\n",
        "    print(\n",
        "        \"\u26d4 Please register at https://pathway.com/developers/documentation/introduction/installation-and-first-steps\\n\"\n",
        "        \"to Copy & Paste the Linux pip install line for Pathway!\"\n",
        "    )\n",
        "    raise InterruptExecution\n",
        "\n",
        "if 'macosx_11_0_arm64' in PATHWAY_INSTALL_LINE:\n",
        "    PATHWAY_INSTALL_LINE = PATHWAY_INSTALL_LINE.replace('macosx_11_0_arm64','manylinux_2_12_x86_64.manylinux2010_x86_64')\n",
        "DO_INSTALL = False\n",
        "import sys\n",
        "if sys.version_info >= (3, 8):\n",
        "    print(f'\u2705 Python {sys.version} is active.')\n",
        "    try:\n",
        "        import pathway as pw\n",
        "        print('\u2705 Pathway successfully imported.')\n",
        "    except:\n",
        "        DO_INSTALL = True\n",
        "else:\n",
        "    print(\"\u26d4 Pathway requires Python 3.8 or higher.\")\n",
        "    raise InterruptExecution\n",
        "\n",
        "if DO_INSTALL:\n",
        "    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || echo \"\u231b Installing Pathway. This usually takes a few seconds...\"\n",
        "    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || pip install {PATHWAY_INSTALL_LINE} 1>/dev/null 2>/dev/null\n",
        "    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || echo \"\u26d4 Installation failed. Don't be shy to reach out to the community at https://pathway.com !\"\n",
        "    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null && echo \"\u2705 All installed. Enjoy Pathway!\"\n"
      ],
      "metadata": {
        "id": "pip-installation-pathway",
        "cellView": "form"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "id": "0f2d62ce",
      "metadata": {},
      "source": [
        "\n",
        "# Getting started with Pathway\n",
        "\n",
        "In the following, you can find instructions on how to start using Pathway.\n",
        "\n",
        "## How to install Pathway"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ff887d05",
      "metadata": {},
      "source": [
        "You can download the current Pathway release, which is now available in Open Beta\n",
        "on a free-to-use [license](https://pathway.com/license):\n",
        "::pip-install\n",
        "::\n",
        "on a Python 3.8+ installation, and we are ready to roll!\n",
        "\n",
        "To use Pathway, we only need to import it:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "12cdf737",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:50:58.633792Z",
          "iopub.status.busy": "2023-01-09T10:50:58.633421Z",
          "iopub.status.idle": "2023-01-09T10:50:59.972167Z",
          "shell.execute_reply": "2023-01-09T10:50:59.971519Z"
        }
      },
      "outputs": [],
      "source": [
        "import pathway as pw"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "11cb1dd9",
      "metadata": {},
      "source": [
        "## How to connect to your first table\n",
        "\n",
        "The first thing you need to do is to access the data you want to manipulate.\n",
        "Pathway provides many connectors to access your data and manipulate them.\n",
        "\n",
        "As an example, let's load a table using a csv connector:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "id": "3b92e6d2",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:50:59.976040Z",
          "iopub.status.busy": "2023-01-09T10:50:59.975642Z",
          "iopub.status.idle": "2023-01-09T10:50:59.984427Z",
          "shell.execute_reply": "2023-01-09T10:50:59.983918Z"
        }
      },
      "outputs": [],
      "source": [
        "\n",
        "table_dogs = pw.debug.table_from_markdown(\n",
        "    \"\"\"\n",
        "    | name  | age\n",
        "  1 | Ace   | 8\n",
        "  2 | Bella | 5\n",
        "  3 | Coco  | 13\n",
        " \"\"\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "12ae58a4",
      "metadata": {},
      "source": [
        "Now the table is loaded. But if we try to print it, we obtain a very generic output: `Table['age', 'name']`\n",
        "\n",
        "That's perfectly normal: as explained in our [introduction to programming in Pathway](https://pathway.com/developers/documentation/introduction/key-concepts), Pathway is used to schedule the operations that will be later performed in realtime by the runtime engine. To process the actual data in our example, we need to use debug function called `compute_and_print`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "0a3e4146",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:50:59.987025Z",
          "iopub.status.busy": "2023-01-09T10:50:59.986631Z",
          "iopub.status.idle": "2023-01-09T10:50:59.991659Z",
          "shell.execute_reply": "2023-01-09T10:50:59.991165Z"
        }
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | name  | age\n",
            "^2TMTFGY... | Ace   | 8\n",
            "^YHZBTNY... | Bella | 5\n",
            "^SERVYWW... | Coco  | 13\n"
          ]
        }
      ],
      "source": [
        "pw.debug.compute_and_print(table_dogs)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "aba96e7c",
      "metadata": {},
      "source": [
        "## Some basic operations using Pathway\n",
        "\n",
        "Now that we have a table, we are going to do some basic operations on it.\n",
        "You can find the full list of the supported operations in our [API documentation](https://pathway.com/developers/documentation/api-docs/pathway).\n",
        "\n",
        "The first thing we may want, is to filter on the age and keep only the dogs younger than 10 years old.\n",
        "We can use the operator `filter` on the column 'age'.\n",
        "To access a column, we can either use the notation `table_name.column_name` or use the more generic `table['column_name']`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "id": "33a5beb2",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:50:59.994050Z",
          "iopub.status.busy": "2023-01-09T10:50:59.993684Z",
          "iopub.status.idle": "2023-01-09T10:50:59.999837Z",
          "shell.execute_reply": "2023-01-09T10:50:59.999350Z"
        }
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | name  | age\n",
            "^2TMTFGY... | Ace   | 8\n",
            "^YHZBTNY... | Bella | 5\n"
          ]
        }
      ],
      "source": [
        "table_dogs_young = table_dogs.filter(\n",
        "    table_dogs.age <= 10\n",
        ")  # table_dogs['age'] also works\n",
        "pw.debug.compute_and_print(table_dogs_young)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "90e50666",
      "metadata": {},
      "source": [
        "We can also apply a function to a given column.\n",
        "Let's say that we want to change the value of a column.\n",
        "Due to an error in rounding, all the age values are wrong and should be decreased by one.\n",
        "We can modify the table using the `apply` operation:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "c609def5",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:51:00.002277Z",
          "iopub.status.busy": "2023-01-09T10:51:00.001885Z",
          "iopub.status.idle": "2023-01-09T10:51:00.007104Z",
          "shell.execute_reply": "2023-01-09T10:51:00.006630Z"
        }
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | name  | age\n",
            "^2TMTFGY... | Ace   | 7\n",
            "^YHZBTNY... | Bella | 4\n",
            "^SERVYWW... | Coco  | 12\n"
          ]
        }
      ],
      "source": [
        "table_dogs_corrected = table_dogs.select(\n",
        "    table_dogs.name, age=pw.apply((lambda x: x - 1), table_dogs[\"age\"])\n",
        ")\n",
        "pw.debug.compute_and_print(table_dogs_corrected)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "001545d6",
      "metadata": {},
      "source": [
        "What happens here is that we select from `table_dogs` the column 'name' and a column 'age' which is obtained by the operator `apply`: `pw.apply(f,col)` applies `f` to each entry in `col` (there may be several such columns)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "07cf3016",
      "metadata": {},
      "source": [
        "To do more complicated operations, we may need a second table:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "id": "d0f4b154",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:51:00.009525Z",
          "iopub.status.busy": "2023-01-09T10:51:00.009176Z",
          "iopub.status.idle": "2023-01-09T10:51:00.017234Z",
          "shell.execute_reply": "2023-01-09T10:51:00.016707Z"
        }
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | name  | owner\n",
            "^2TMTFGY... | Ace   | Alice\n",
            "^YHZBTNY... | Bella | Bob\n",
            "^SERVYWW... | Coco  | Alice\n"
          ]
        }
      ],
      "source": [
        "table_dogs_owners = pw.debug.table_from_markdown(\n",
        "    \"\"\"\n",
        "    | name  | owner\n",
        "  1 | Ace   | Alice\n",
        "  2 | Bella | Bob\n",
        "  3 | Coco  | Alice\n",
        " \"\"\"\n",
        ")\n",
        "pw.debug.compute_and_print(table_dogs_owners)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "db7f7e9a",
      "metadata": {},
      "source": [
        "We can build a table with both information using the operator `join`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "id": "4c9fb868",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-01-09T10:51:00.019740Z",
          "iopub.status.busy": "2023-01-09T10:51:00.019469Z",
          "iopub.status.idle": "2023-01-09T10:51:00.026382Z",
          "shell.execute_reply": "2023-01-09T10:51:00.025879Z"
        }
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | name  | age | owner\n",
            "^VJ3K9DF... | Ace   | 7   | Alice\n",
            "^V1RPZW8... | Bella | 4   | Bob\n",
            "^R0GE4WM... | Coco  | 12  | Alice\n"
          ]
        }
      ],
      "source": [
        "table_dogs_full = table_dogs_corrected.join(\n",
        "    table_dogs_owners, table_dogs_corrected.name == table_dogs_owners.name\n",
        ").select(table_dogs_corrected.name, table_dogs_corrected.age, table_dogs_owners.owner)\n",
        "pw.debug.compute_and_print(table_dogs_full)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "13b01d1c",
      "metadata": {},
      "source": [
        "## To go further\n",
        "\n",
        "Now you know the basis of Pathway! If you want to learn more about the Pathway, you can check out our [survival guide](https://pathway.com/developers/documentation/table-operations/survival-guide/), our manual about [joins](https://pathway.com/developers/documentation/table-operations/join-manual), or our introduction about [transformers](https://pathway.com/developers/documentation/transformer-classes/transformer-intro).\n",
        "\n",
        "As we continue you will see some more advanced programming constructs which provide a lot of flexibility to Pathway:\n",
        "* Applying Machine Learning to data tables.\n",
        "* The ability to do iteration and recursion."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ceb51e17",
      "metadata": {},
      "source": [
        "We will also use Pathway connectors to external data sources (for data inputs) and sinks (for data outputs)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "54fd8928",
      "metadata": {},
      "source": [
        "This, and a lot more, is covered in recipes in the Pathway cookbook - try these for a start:\n",
        "* [Detect suspicious user activity](https://pathway.com/developers/tutorials/suspicious_activity_tumbling_window).\n",
        "* [Find the time elapsed between events in an event stream](https://pathway.com/developers/tutorials/event_stream_processing_time_between_occurrences).\n",
        "* [Compute the PageRank of a network](https://pathway.com/developers/tutorials/pagerank)."
      ]
    }
  ],
  "metadata": {
    "jupytext": {
      "cell_metadata_filter": "-all",
      "main_language": "python",
      "notebook_metadata_filter": "-all"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}